64 research outputs found

    SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

    Full text link
    3D reconstruction from 2D image was extensively studied, training with depth supervision. To relax the dependence to costly-acquired datasets, we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a radiance field though with explicit depth optimization and a novel probabilistic sampling strategy to efficiently handle large scenes. At inference, a single input image suffices to hallucinate novel depth views which are fused together to obtain 3D scene reconstruction. Thorough experiments demonstrate that we outperform all recent baselines for novel depth views synthesis and scene reconstruction, on indoor BundleFusion and outdoor SemanticKITTI. Our code is available at https://astra-vision.github.io/SceneRF.Comment: Project page: https://astra-vision.github.io/SceneR

    Vision for Scene Understanding

    Get PDF
    This manuscript covers my recent research on vision algorithms for scene understanding, articulated in 3 research axes: 3D Vision, Weakly supervised vision, and Vision and physics. At the core of the most recent works is weakly-supervised learning and physics-embodied vision, which address short comings of supervised learning that requires large amount of data. The use of more physically grounded algorithms appears evidently beneficial as both robots and humans naturally evolve in a 3D physical world. On the other hand, accounting for physics knowledge reflects important cue about lighting and weather conditions of the scene central in my work. Physics-informed machine learning is not only beneficial for increased interpretability but also to compensate labels and data scarcity

    Detection of Unfocused Raindrops on a Windscreen using Low Level Image Processing

    No full text
    International audienceIn a scene, rain produces a complex set of visual effects. Obviously, such effects may infer failures in outdoor vision-based systems which could have important side-effects in terms of security applications. For the sake of these applications, rain detection would be useful to adjust their reliability. In this paper, we introduce the problem (almost unprecedented) of unfocused raindrops. Then, we present a first approach to detect these unfocused raindrops on a transparent screen using a spatio-temporal approach to achieve detection in real-time. We successfully tested our algorithm for Intelligent Transport System (ITS) using an on-board camera and thus, detected the raindrops on the windscreen. Our algorithm differs from the others in that we do not need the focus to be set on the windscreen. Therefore, it means that our algorithm may run on the same camera sensor as the other vision-based algorithms

    Model-based occlusion disentanglement for image-to-image translation

    Full text link
    Image-to-image translation is affected by entanglement phenomena, which may occur in case of target data encompassing occlusions such as raindrops, dirt, etc. Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. The experiments demonstrate our method is able to handle varying types of occlusions and generate highly realistic translations, qualitatively and quantitatively outperforming the state-of-the-art on multiple datasets.Comment: ECCV 202

    LMSCNet: Lightweight Multiscale 3D Semantic Completion

    Full text link
    We introduce a new approach for multiscale 3D semantic scene completion from sparse 3D occupancy grid like voxelized LiDAR scans. As opposed to the literature, we use a 2D UNet backbone with comprehensive multiscale skip connections to enhance feature flow, along with 3D segmentation heads. On the SemanticKITTI benchmark, our method performs on par on semantic completion and better on completion than all other published methods - while being significantly lighter and faster. As such it provides a great performance/speed trade-off for mobile-robotics applications. The ablation studies demonstrate our method is robust to lower density inputs, and that it enables very high speed semantic completion at the coarsest level. Qualitative results of our approach are provided at http://tiny.cc/lmscnet.Comment: For a demo video, see http://tiny.cc/lmscne

    COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation

    Full text link
    Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a prototype memory bank capturing class-wise global dataset information efficiently into a small number of prototypes acting as keys. An entropy-driven sampling technique then allows us to select good pixels from predictions as anchors. Experiments on three projection-based backbones show we outperform baselines on three challenging real-world outdoor datasets, working with as low as 0.001% annotations

    ManiFest: Manifold Deformation for Few-shot Image Translation

    Full text link
    Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to be composed of large numbers of images). The learned manifold is interpolated and deformed towards the few-shot target domain via patch-based adversarial and feature statistics alignment losses. All of these components are trained simultaneously during a single end-to-end loop. In addition to the general few-shot translation task, our approach can alternatively be conditioned on a single exemplar image to reproduce its specific style. Extensive experiments demonstrate the efficacy of ManiFest on multiple tasks, outperforming the state-of-the-art on all metrics and in both the general- and exemplar-based scenarios. Our code is available at https://github.com/cv-rits/Manifest .Comment: ECCV 202

    Influence of Fog on Computer Vision Algorithms

    Get PDF
    This technical report describes a new preliminary approach to simulate fog in images using accurate physical and photometric models to study the influence of small particles on computer vision algorithms
    • …
    corecore